Validating smartphone-collected speech corpora
نویسندگان
چکیده
We investigate the effectiveness with which the accuracy of a prompted speech corpus can be validated when minimal additional speech resources are available, and specifically when a language model in the target language is not available. We compare a word-based variant of Goodness of Pronunciation (GOP) with a phone-based dynamic programming (PDP) scoring technique. The first technique uses the acoustic likelihood ratio and the second the optimal alignment between an observed phone string (generated by a speech recogniser) and a reference phone string (obtained from a dictionary) to generate validation scores. We define a new technique to obtain a PDP scoring matrix in a data-driven fashion, examine different ways of using GOP for word scoring, and find that variants of both techniques provide results that are effective for corpus validation.
منابع مشابه
Building ASR Corpora Using Eyra
Building acoustic databases for speech recognition is very important for under-resourced languages. To build a speech recognition system, a large amount of speech data from a considerable number of participants needs to be collected. Eyra is a toolkit that can be used to gather acoustic data from a large number of participants in a relatively straight forward fashion. Predetermined prompts are ...
متن کاملHTIMIT and LLHDB: speech corpora for the study of handset transducer effects
This paper describes two corpora collected at Lincoln Laboratory for the study of handset transducer e ects on the speech signal: the handset TIMIT (HTIMIT) corpus and the Lincoln Laboratory Handset Database (LLHDB). The goal of these corpora are to minimize all confounding factors and produce speech predominately di ering only in handset transducer e ects. The speech is recorded directly from ...
متن کاملThe EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech
Currently available speech recognisers do not usually work well with elderly speech. This is because several characteristics of speech (e.g. fundamental frequency, jitter, shimmer and harmonic noise ratio) change with age and because the acoustic models used by speech recognisers are typically trained with speech collected from younger adults only. To develop speech-driven applications capable ...
متن کاملDesign, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks
The design and compilation of the CUCall telephone speech corpora is described in this paper. Speech database is an indispensable resource for research and development of state-of-the-art spoken language technology. These speech recognition systems rely greatly on a huge amount of well-designed and appropriately processed speech data for parameters training. On the other hand, as telephony appl...
متن کاملDevelopment of Speech corpora for different Speech Recognition tasks in Malayalam language
Speech corpus is the backbone of an Automatic speech Recognition system. This paper presents the development of speech corpora for different speech recognition tasks in Malayalam language. Pronunciation dictionary and Transcription file which are the other two essential resources for building a speech recognizer are also being created. Speech recognition performance of different speech recognit...
متن کامل